专利摘要:
data processing apparatus and method, method for compiling an application for execution by a data processor, computer program product, and virtual machine data processing apparatus is disclosed, having: an instruction decoder configured to decode a sequence instruction, a data processor configured to process the decoded instruction sequence; wherein in response to a plurality of adjacent instructions within the instruction execution sequence which is dependent on a data condition being met and whose execution when said data condition is not met does not change a said state of the processing apparatus, the processor stops: to begin determining whether the data condition is met or not; and begin processing said plurality of adjacent instructions; and responding to the determination that said data condition is not met; skips the next instruction to be executed after said plurality of adjacent instructions without executing any intermediate instructions of said plurality of adjacent instructions not yet executed and continues execution in the next instruction.
公开号:BR112013019824B1
申请号:R112013019824-9
申请日:2012-01-26
公开日:2021-06-15
发明作者:Alastair David Reid
申请人:Arm Limited;
IPC主号:
专利说明:

FIELD OF THE INVENTION
[0001] The Field of the Invention relates to data processing apparatus and in particular to the processing of predicted instructions which are instructions whose execution is dependent on data conditions. KNOWLEDGE OF THE INVENTION
[0002] Conditional instructions whose execution is dependent on particular data conditions are known. For example, instructions such as CMP x y, ADDGE, SUBLT, which compare two values stored in locations x and y and add them together if x is greater than or equal to y and subtract them from each other if x is less than y.
[0003] Array instructions that perform operations on multiple data elements have become more common. They often use masks to control which elements are processed. For example, executing an 8-element store array instruction using mask 10000001 only stores 2 elements (the first and the last). A common optimization when writing vector code that uses masks in this way is to recognize that a sequence of instructions are all controlled by the same mask and to insert a branch around all those instructions if the mask is zero, as in this case none of the instructions would do nothing. So the code would become: VCMP D0, D1, D2; compare Dl and D2 put resulting mask in DOVTEST D0 ; test that all bits in DO are zerosBEQ LI ; if mask is zero skip to next 10 DO operations -> VOP1 ; perform vector 1 operation under DODO VOP2 mask control; perform vector operation 2 under DO mask control DO -> VOP10 ; perform 10 vector operation under DO LI mask control
[0004] It is very effective if the mask is often zero as the testing and branching only cost two and it avoids the need to perform the 10 instructions that would be performed and the branch was absent. However, a problem with this prior art approach is that the period branching is data dependent on the fact that whether or not it is taken is dependent on two values D1 and D2. Data dependent branches are very difficult to predict and therefore if branch prediction is used to speed up the operation the branch can often be poorly predicted and if it is predicted to be taken when it should not be taken, the state of the machine will go need to be returned due to the length of the data words.
[0005] It would be desirable to be able to improve conditional statement execution performance. SUMMARY OF THE INVENTION
[0006] A first aspect of the present invention provides a data processing apparatus comprising: an instruction decoder configured to decode the sequence of instructions; a data processor configured to process said decoded instruction sequence; said data processor being configured to analyze said instruction sequence and to identify a plurality of adjacent instructions within said instruction sequence, execution of which is dependent on a data condition being encountered and whose execution when said data condition is not found does not change a value stored in a data register, and in response to identifying said plurality of adjacent instructions said data processor is configured to: begin to determine whether said data condition is met or not; and begin processing said plurality of adjacent instructions; and in response to the determination that said data condition is not met; skip to the next instruction to be executed after said plurality of adjacent instructions without executing any intermediate instructions of said plurality of adjacent instructions not yet executed and continue execution in said next instruction.
[0007] The present invention recognizes that where you have conditionally executed instructions that are all dependent on the same condition, and where your execution when the condition is not satisfied does not affect the state of a data processing apparatus, then someone can initiate processing of those instructions before the condition has been determined and if it is found that these instructions should not be executed one can at this point skip to the end of these instructions. In this regard, not changing the state of the data processing apparatus is used to mean that the data values stored in the data records are not changed, therefore, the instructions do not overwrite any stored values and as such, none state recovery needs to be performed in the case that they should not have been performed. In this regard, instruction execution will change such things as the value of a program counter and perhaps a memory for fast data retrieval, however where there is no change to values in the destination registers of the executed instructions then there is no turning back required if instructions are executed when they should not be as their execution will not affect the execution of future instructions. Therefore, where no state recovery is required one can view the state as not having changed.
[0008] Thus, one achieves the execution benefit of skipping some instructions with the associated performance and advantages without the inconvenience of having to perform additional restore steps. It should be noted that there is a small overhead associated with the circuitry required to process this “jump” step.
[0009] In some embodiments said data processing apparatus further comprises a data store, said data processing apparatus being responsive to said plurality of adjacent instructions for storing in said data store an identifier identifying the location of said next said instruction to be executed after executing said plurality of adjacent instructions as a target location and to set a skip flag within the data store, said skip flag indicating to said processor that in response to said data condition the said processor must update a program counter with an address corresponding to said target location.
[00010] Although instruction skipping can be implemented in a number of ways, in some embodiments detection of a plurality of adjacent instructions triggers the processor to set a skip flag and to store an indication of a target location, the jump indicating to the processor that when it has determined that the condition is not met it can jump to the target location. That is, it can stop executing the plurality of adjacent instructions and can continue executing an instruction at the target location, any of the adjacent instructions that were not executed at that point will not execute since at this point someone recognizes that the condition they are dependent on is not satisfied.
[00011] In some embodiments, the identifier identifying the location of said next instruction comprises an address.
[00012] The target location that the execution of instructions is to jump to can be indicated in a number of ways, in some embodiments it simply comprises an address. If this is the case, the program counter can simply be updated with this information. In other embodiments, the information could be stored as an identifier identifying the number of adjacent instructions in which case, a counter can be updated with this value and can be decremented by one each time an instruction is executed and at a point that is determined that the condition is not satisfied the program counter can be updated using the counter information.
[00013] In some embodiments, said plurality of adjacent instructions comprise vector instructions for processing vector data elements, each vector instruction comprising a mask controlling which elements in the vector data elements are processed, said data condition not being found comprising said mask comprising zeros indicating no data elements are to be processed.
[00014] This technique is particularly advantageous when used with vector instructions. When processing vector data element, if one were to use a branch instruction to branch around the conditional instructions, then in order to account for any prognostic failures the state of the processor in the branch would need to be stored. With vector data elements, storing this state is very costly as these data elements are large. Therefore, as the present technique avoids the need to always revert so that no state needs to be stored, it is very advantageous when used with vector data elements.
[00015] The processing of vector data elements is often controlled using a vector mask that controls which elements in the vector data elements are to be processed. If the mask is all zeros then this indicates that none of the elements are to be processed. Therefore, embodiments of the present invention may have as the data condition, the mask not being zeros. Therefore, if the data condition is not met, the mask is all zeros, at that point it is known that execution of the instructions will not change the state of the processor and in fact is not required and one can skip any of those instructions not yet executed. It is also known that any intermediate instructions that were executed but were not required did not change the state of the processor and thus, no return back state is required.
[00016] In some embodiments, said plurality of adjacent instructions comprises a plurality of instructions each comprising a same attribute, said attribute determining said data condition, said instructions only completing and updating any data values stored in response to the said attribute being found, said data processing apparatus being configured to: begin processing said plurality of adjacent instructions; and in response to the determination that said data condition is not met; terminate execution of said plurality of adjacent instructions so that no stored data values are updated; skip to the next instruction subsequent to said plurality of adjacent instructions without executing any intermediate instructions of said plurality of adjacent instructions not yet executed and to continue execution in said next instruction.
[00017] The present technique is also suitable for use with instructions that execute in dependence on attributes. Such instructions only complete and update a processor state once it is determined that the attribute's condition is met. However, they start processing the data before the attribute result is known. As such, these instructions can also be executed before the attribute's condition or result is known. If the attribute is not found then it will not have written any of its calculations to store it, the processor state will not have changed and although they have started their execution, no state needs to be reverted. Therefore, if a plurality of these instructions are adjacent to each other it is advantageous to start processing them and in response to the determination that the attribute condition is not met finish executing the plurality of adjacent instructions and skipping to the next instruction subsequent to them.
[00018] In some embodiments, said data processing apparatus further comprises a data store and detection circuitry; said detection circuitry being responsive to detecting a conditional branch instruction dependent on said data condition not being found to branch forward around said plurality of adjacent instructions to: store a value indicating the target location of said instruction to branch in said data store as the target location and to set the value of said skip flag within said data store; said processor being responsive to said skip flag not to take said branch and to begin determining whether said data condition is met or not; to begin execution of said plurality of adjacent instructions; and in response to determining that said data condition is not met to set a program counter to a value dependent on said target location.
[00019] As noted in the introduction to this application, a conventional way to deal with this plurality of adjacent instructions is to branch around it. Therefore, if the data processing apparatus is configured to detect the conditional branch instruction which is dependent on the data condition followed by a plurality of such adjacent instructions, it may be advantageous to instigate the skip procedure rather than take the branch . Therefore, rather than taking the branch and having to store the state in the case that the branch is poorly predicted, the data processing apparatus recognizes the instruction pattern and starts processing the adjacent instructions having set the skip flag of so that when it determines whether the condition is met or not it can, if the condition is not met, it skips to the end of these adjacent instructions, whereas if the condition is met it simply proceeds to execute all adjacent instructions and clears the skip flag.
[00020] In some embodiments, said conditional branch instruction is conditional on a vector value, said processor starts determining whether said data condition is met or not, executing a vector test instruction by testing said value of the vector and in response to said vector value being determined to comprise a predetermined value being determined that said data condition is not met.
[00021] In some embodiments, the conditional branch instruction is conditional on a vector value. In such a case the processor will begin determining whether or not the data condition is met by executing a test vector instruction and then will begin executing the plurality of adjacent instructions. When a result from the received vector test instruction, if it indicates that the condition is not met the processor will skip to the end of the adjacent instructions and continue processing from the next instruction.
[00022] In some embodiments, said data processing apparatus further comprises a data store and detection circuitry; said detection circuitry being responsive to detecting a conditional branch instruction dependent on said data condition to branch back to the beginning of said plurality of adjacent instructions to: store a location of said branch instruction incremented by one node said data store as said target location and for setting a skip flag value within the data store; said processor being responsive to said skip flag to take said extension and to determine whether said data condition is met or not; and to begin execution of said plurality of adjacent instructions; and in response to determining that said data condition is not met, setting a program counter to a value dependent on a value stored in said data store as said target location.
[00023] Embodiments of the present invention can also be used to handle backward branches where a sequence of adjacent instructions conditioned to the same condition are branched through a backward branch. In this case, when the processing apparatus recognizes this pattern of instructions, it will take the branch and start executing the adjacent instructions. However, once it is determined whether the condition is met or not, if the condition is not met then it will stop executing adjacent instructions and will continue to execute in a subsequent instruction in the subsequent program sequence for the branch it took. .
[00024] Once again, if the conditional branch instruction is conditional on an array then the processor determines whether or not the data condition is met by executing a test array instruction that tests a value of the array.
[00025] In some embodiments, the data processing apparatus further comprises a data store and detection circuitry; said detection circuitry being responsive to detecting an instruction indicating said plurality of adjacent instructions dependent on said data condition following said instruction, said instruction comprising said data condition and an indication whether it is to be met or not for each statement to be completed; to store an identifier identifying the location of said next instruction following said plurality of instructions with said target location and to set a skip flag value within the data store; said processor being responsive to said skip to jump flag to said target location in response to the determination that said data condition is not met.
[00026] In some embodiments there is detection circuitry provided to detect a pattern of instructions where implementing the skip function would be advantageous.
[00027] In some embodiments, in response to the determination that said data condition is met, said processor is configured to continue executing said plurality of adjacent instructions and to clear said skip flag.
[00028] If the processor determines that the data condition is met then the plurality of adjacent instructions need to be executed, in which case the processor continues to execute them and it clears the skip flag.
[00029] In some embodiments, in response to execution of all said adjacent instructions before said data condition is determined said processor is configured to clear said skip flag.
[00030] Furthermore, if the data condition is not determined before all adjacent instructions have been executed, then it is no longer relevant as to whether or not the processor should skip and the skip flag should be cleared in this case as well.
[00031] A second aspect of the present invention provides a data processing apparatus comprising: an instruction decoder configured to decode at least one skip instruction, the said at least one skip instruction specifying a target register and a condition condition register. Dice; a data processor configured to perform data processing operations controlled by said instruction decoder wherein: said data processor is responsive to at least one said decoded skip instruction to begin determining said data condition and to begin processing subsequent instructions within the sequence of said instruction, and in response to determining that said data condition does not have a predetermined result to set a program counter to a value dependent on a value stored in said target register.
[00032] Although the data processor can generate the functionality of a skip instruction having detected a pattern of instructions, in some embodiments the skip instruction is an effective instruction that the processor will decode and execute and then perform the skip functionality.
[00033] In some embodiments, said instruction decoder is responsive to at least one said skip instruction to set a skip flag in said data store.
[00034] The decoder can respond to the skip instruction in a number of ways, but in some embodiments it sets a skip flag which indicates to the processor that it is expecting the result of the determination considering the data condition, and that since it has that result if the condition is not met then it needs to perform the skip function.
[00035] A third aspect of the present invention provides a data processing apparatus comprising: an instruction decoder configured to decode at least one ring instruction, the at least one said ring instruction specifying a target register and a condition register. data, a data processor configured to perform data processing operations controlled by said instruction decoder wherein: said data processor is responsive to at least one said decoded ring instruction to begin determining said data condition and to setting said program counter to an address of an instruction previously executed at a start of said loop and to begin processing instructions within said sequence of said instruction, and in response to the determination that said data condition does not have a predetermined result for configure a program counter for a va value dependent on a value stored in said target record.
[00036] An additional instruction that may be available is the ring instruction, which allows you to skip adjacent instructions that take a backward branch once it is determined that the condition they are dependent on is not met.
[00037] A fourth aspect of the present invention provides a method for compiling an application for execution by a data processor comprising the steps of: analyzing said plurality of instructions within the application; generate a sequence of instructions including the skip instruction followed by a plurality of adjacent instructions, execution of which is dependent on the same data condition and whose execution when said data condition is not met does not change a value stored in a data register. data, said skip instruction specifying a target address, said target address being an address of the next instruction subsequent to said adjacent instructions and said data condition, said skip instruction when executed by said data processor controlling said processor of data to begin the determination of whether said data condition is met or not; and starting processing of said plurality of adjacent instructions; and in response to the determination that said data condition is met; skipping to the next instruction subsequent to said plurality of adjacent instructions without executing which intermediate instructions of said plurality of adjacent instructions not yet executed to continue execution of said next instruction; and in response to the determination that said data condition is not met to continue executing said plurality of adjacent instructions.
[00038] A compiler may also generate a skip instruction in response to code analysis and determine that there are a plurality of adjacent instructions that are all dependent on the same data condition and execution of which if the data condition is not met it will not will change the state of the processor. In this way, the compiler provides an optimization of the code before being executed by the processor.
[00039] A fifth aspect of the present invention provides a method for compiling an application for execution by a data processor comprising the steps of: analyzing a said plurality of instructions within the application; generate a sequence of instructions including the ring instruction specifying an address of a first of a plurality of adjacent instructions to jump to, said plurality of adjacent instructions characterized in that it comprises instructions whose execution is dependent on a same data condition and whose execution when said data condition is not met does not change a value stored in a data register, said ring instruction specifying said data condition and a target address, said target address being an address of the next instruction subsequent to said ring instruction, said ring instruction when executed by said data processor controlling said data processor to begin determining whether said data condition is met or not; and jumping to and starting processing of said plurality of adjacent instructions; and in response to the determination that said data condition is met; jumping to said instruction specified by said target address without executing any intermediate instructions of said plurality of adjacent instructions not yet executed and continuing execution of said instruction; and in response to the determination that said data condition is not met to continue executing said plurality of adjacent instructions.
[00040] The compiler may also generate a ring instruction in response to detecting backward branches for the plurality of conditional adjacent instructions. Again this can improve a processor's performance by processing such instructions.
[00041] A sixth aspect of the present invention provides a computer program product for controlling an apparatus for carrying out a method according to a fifth or sixth aspect of the present invention.
[00042] A seventh aspect of the present invention provides a data processing method characterized in that it comprises the following steps: decoding a sequence of instructions; processing said decoded instruction sequence; in response to a plurality of adjacent instructions within said instruction sequence, execution of which is dependent on a data condition being met and whose execution when said data condition is not met does not change a value stored in a data record: begin to determine whether said data condition is met or not; and starting to process said plurality of adjacent instructions; and in response to the determination that said data condition is not met; skip to the next instruction to be executed after said plurality of adjacent instructions without executing any intermediate instructions of said plurality of adjacent instructions not yet executed and continue execution on said next instruction.
[00043] An eighth aspect of the present invention provides a virtual machine provided by a computer program running in a data processing apparatus, said virtual machine providing an instruction execution environment in accordance with the data processing apparatus of a first aspect of the present invention.
[00044] The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of the illustrative embodiments which is to be read in conjunction with the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS
[00045] Figure 1 shows a data processing apparatus according to an embodiment of the present invention;
[00046] Figures 2a and b show an example of a skip instruction;
[00047] Figure 3 shows an example of a ring instruction;
[00048] Figure 4 shows states entered by a processing device in an instruction in standard detection mode;
[00049] Figure 5, schematically, shows a compiler according to an embodiment of the present invention;
[00050] Figure 6 shows a data processing apparatus according to a further embodiment of the present invention;
[00051] Figure 7 shows a flow diagram illustrating steps in a method for performing a skip operation according to an embodiment of the present invention;
[00052] Figure 8 shows a flow diagram illustrating steps in a method for performing a ring operation according to an embodiment of the present invention;
[00053] Figure 9 shows steps in a method for compiling a program including generating ring and skip instructions; and
[00054] Figure 10 shows a virtual machine implementation of an embodiment of the present invention. DESCRIPTION OF MODALITIES
[00055] Figure 1 shows a data processing apparatus 10 for processing an instruction sequence according to an embodiment of the present invention. The data processing apparatus 10 has a fast data retrieval instruction memory 12 for storing instructions to be processed. These instructions are captured by the capture unit 14 in dependence on the value in the program counter 40. The captured instructions are sent to the instruction decoder 16 for decoding and are then executed by one of the parallel execution lines 18. The capture unit , the decoding unit and the execution lines form the processing circuitry 30 for processing the instructions. There is also a data store 20 comprising registers holding control values controlling the execution of the instructions and an additional data store 35 comprising registers for storing data values that are processed by the instructions being executed. It should be noted that data warehouse 20 and data warehouse 35 may in some embodiments form a single bank of record.
[00056] In an embodiment of the present invention, the captured instructions include a skip instruction followed by a plurality of vector instructions which are all conditioned on a vector mask not being 0. If the vector mask is 0 then no operation is performed in any of the data elements in response to the plurality of vector instructions. The skip instruction contains a register specifier which identifies the register containing the vector mask and an instruction address which is the target address, i.e. the address of an instruction subsequent to the last of a plurality of vector instructions in the instruction sequence .
[00057] The decoder when decoding the skip instruction writes the value of the target address into register 22, and the address of the vector mask register into register 23 and it sets a skip flag into register 21. The skip flag indicates to processing circuitry 30 that a jump is pending and that if it is determined that the data condition is not met, in which case the vector mask register contains zeros, then program counter 40 must be set to the value in the target record 22.
[00058] Therefore, in response to the skip instruction the decoder writes these values to the data store 20 and one of the run lines 18 executes a test vector instruction which checks the value of the vector mask register to see if it is 0. The first conditional vector instruction in the consecutive conditional instructions is then executed on another of the parallel execution lines 18. The pick and decode unit continues to pick up and decode these consecutive instructions until the vector test operation determines a result for the vector mask. If the vector mask is 0 then the program counter is set to the target value of the skip instruction and this is the next instruction that is taken. If the vector mask is not 0 then the skip flag set in response to the skip instruction in data store 20 is reset to and the consecutive instructions are all executed.
[00059] It should be noted that although if the vector mask is 0 some instructions may have been executed that do not need to be executed while the vector mask register value was being determined, the vector mask being 0 means that no elements are stored from these operations and so, the data values in registers 35 are not overwritten and the processor state is not changed in that the state needs to be retrieved and the processor can simply jump to the target instruction.
[00060] Figures 2a and 2b show an example of the skip instruction and how the processor implements it. Therefore, SKIP P,L indicates an optional branch to an address L that depends on the condition P.
[00061] Following the skip instruction, if the processor is able to immediately resolve condition P the processor will immediately branch to handle L. If the processor is not able to resolve the condition immediately then it will continue executing the subsequent instructions that are conditional on a P value. The processor "remembers" that it has an unresolved jump by modifying the following three registers. It modifies the skip flag record by setting pending_skip to true. It modifies the target_skip register by setting it to L, L being the address to branch to and it sets the register_skip to P which in this case is the vector mask register.
[00062] Figure 2b shows how the processor acts for this instruction. While the pending skip flag is set (pending_skip = true) if the program counter reaches the target address of the skip instruction then the skip flag is reset to à (pending_skip = false) since at this point all consecutive instructions have been executed and therefore the jump is no longer pending. Also, if the condition that the skip instruction is dependent on is resolved, in that case if the vector mask register is 0 then the program counter is set to the target_skip address and all consecutive intermediate instructions are skipped. At this point the skip flag is reset to false again.
[00063] It should be noted that although setting the skip pending flag and allowing the processor not to process instructions whose conditions are not met will improve processor performance, this is not essential and the processor will still execute correctly if these instructions are not skipped. Therefore, at any point the processor can reset the skip flag. It can do this where the comparison to determine whether the condition is met or not is difficult to effect and is therefore unlikely to be performed before the instructions that are conditional on it. This is done by resetting the skip flag that the processor releases the registers that store the target address of the event mask and the flag.
[00064] It should be noted that although in the example given above the skip instruction specified an address, it could more properly specify a number of instructions or a number of bytes of instruction to skip. If this implementation was used then in response to executing the skip instruction a counter will be set to this number and will be decremented by one for each instruction of the consecutive instructions that are executed. When the counter reaches 0 someone has left the jump region and the jump flag is reset. If however the counter does not reach 0 and the vector mask was found to be 0 then the number of instructions left in the counter will be skipped, in other words the program counter will be incremented by this value.
[00065] An alternative to specifying the attribute register address would be where there is a global register used for vector masks. In this case, the register need not be specified within the jump instruction as you would know that the jump instruction was always referring to this register.
[00066] Figure 3 shows a variant of the skip instruction, the VLOOP instruction and the behavior this instruction has.
[00067] In this case the code of interest is a ring that iterates until a mask becomes all 0. This code differs in several main ways from the jump above code as it uses a backward branch and in this case we need to speculate that the branch is taken that again corresponds to speculating that the mask is not 0. Once again (as shown in Figure 4) the same registers as those used for the jump instruction in Figure 2 are used for the VLOOP instruction to indicate the target register and the assigned register but a different flag is set to indicate that it is a VLOOP that is more properly pending than the skip instruction. As in the skip instruction once it is determined that the attribute is all zero, the processor jumps to the target address. At this point the VLOOP flag is reset to false.
[00068] In the example just given the skip and ring operations were implemented architecturally in that they were new instructions. They could also be implemented in a microarchitectural way where an early stage decoder or a pre-decoder could spot a sequence of operations controlled by the same attribute and where its operation if the condition is not met does not change the architectural state it could. implements either the jump or lasso operation. It should be noted that detecting this type of pattern is particularly simple for vector instructions such as those set out in the introduction to this application where they are dependent on a global register and they all behave like no-ops if the global register is identically equal to 0 ,
[00069] Figure 4 illustrates, schematically, different states that the processing apparatus 10 would operate, in this standard detection mode. Thus, the pickup unit 14 picks up the instructions and an initial decoding stage 16 picks up a particular pattern code. In this example, the processor is operating in a normal "a" state and in response to detecting a vector test operation VTEST testing a particular vector mask register P it sets register_skip 22 to this vector mask value. It then proceeds to state b. If subsequent instructions are normal instructions it will return to an a state along the route *, as at this point it recognizes that the pattern of instructions it is detecting is not there. If however there is a branch instruction which is dependent on the assigned register P it will set register target_skip 23 as the target branch value, in other words L and proceed to state c. In state c it will process consecutive operations that are dependent on the vector mask register until the value of the vector mask register is determined. If the vector mask register value is 0 then it will proceed to state d setting the program counter to the value of target_skip. Then it will execute the instruction at this address and it will return to the normal state a. If during the operation of the consecutive instructions it was found that the vector mask was not 0 or the program counter rose above target_skip then it will return to a normal state along with the routes marked by *.
[00070] Figure 5 shows a compiler 50 according to an embodiment of the present invention. Compiler 50 takes a program written by a programmer and transforms it into code that is processed by the processor. When doing this it takes steps to optimize the code so that it runs more efficiently. In one embodiment of the present invention it performs pattern matching steps similar to those performed by the earlier stage of the decoder described with respect to Figure 4, and generates code that controls a process to perform the skip or loop functions as appropriate.
[00071] An additional instruction that the skip function can be used together with is the Thumb IT instruction which indicates that various operations must be performed depending on particular conditions. Thus, an ITTTGE instruction indicates that the next 3 operations must be performed if one value is greater than or equal to another one. As the results of these operations are not written back until the result of a condition is known, a skip flag could be set and it and if it was determined that the GE condition was not met before the first three operations were completed then the processor could skip to the fourth operation.
[00072] Figure 6 shows an alternative embodiment of the data processing apparatus 10, in which there is a register 24 within the register bank 20 which is configured to store vector pending vector masks. Therefore, the jump and lasso instructions do not specify a vector mask register. In this mode, rather than setting a target address in a register, the number of instructions that are conditioned on a condition, such as the vector mask for a skip instruction, is set in counter 42. As each of these instructions is executed a signal is sent to the counter and it is decremented by one. If the condition is determined not to be met while the conditional instructions are still being executed then program counter 40 is updated by adding the value in counter 42 thereto.
[00073] Figure 7 shows a flow diagram illustrating steps in a method for processing an instruction sequence according to an embodiment of the present invention. An instruction sequence is decoded, and the decoder detects a plurality of instructions that are dependent on the same data condition and whose execution when said data condition is not met does not change a state of the data processor. In response to skew detection, or in other embodiments in response to decoding a jump instruction, the decoder sets a jump flag and writes a target address of the jump operation to a target address register. This target address is specified in the skip instruction in case the decoder is decoding a skip instruction or in case it detects a plurality of instructions depending on the same data condition, it is the address of the instruction subsequent to those instructions.
[00074] The processor then further determines whether the resulting data condition has been determined or not. If it has not been determined then a first of the adjacent adjacent conditional instructions is executed. It is then determined whether the program counter is greater than the target address, if not then that the condition has not yet been determined a subsequent conditional instruction is executed and the program counter checked again.
[00075] If at any point the result of the condition is determined then if it is found, the skip flag is reset and execution of the conditional instructions is continued. If the condition is not met then the program jumps to the target address by setting the program counter to the target address and the skip flag is reset.
[00076] If however, before the condition result is determined the program counter reaches the target address then the skip flag is reset and execution of the instruction sequence continues in the normal manner.
[00077] Figure 8 shows a flow diagram illustrating steps in a method for processing an instruction sequence including a backward jump to a plurality of conditional instructions. An instruction sequence is decoded, and the decoder detects a jump back to a plurality of instructions that are dependent on the same data condition and whose execution when said data condition is not met does not change a state of the data processor. In response to detecting this, or in other embodiments in response to decoding a ring instruction, the decoder sets a VLOOP flag and writes a ring operation target address to a target address register. This target address is the current program counter incremented by one.
[00078] The processor then skips back to the first conditional instruction. It is then determined whether a condition result has yet been determined or not. If not, then the next conditional statement is executed and determined again. If the condition has been met, if it has been met then the VLOOP flag is reset and execution of the similar conditional instructions continues. If not found the program jumps to the target address and resets the VLOOP flag. If the condition is not determined while the conditional instructions are being executed then a branch back to the target address will occur in the program sequence at which point the VLOOP flag will be reset. These steps are not shown.
[00079] Figure 9 shows steps in a method for compiling a program. Instructions within the program are analyzed and it is determined whether there are a plurality of adjacent conditional instructions that are dependent on the same data and execution condition which do not change a state of the processor. If these are detected, it is determined whether there is a backward branch for these instructions. If there is none, the skip instruction is generated and placed in the instruction sequence followed by the plurality of conditional instructions. If there is a back branch then the ring instruction is generated.
[00080] Figure 10 illustrates a virtual machine implementation that can be used. While the above-described embodiments implement the present invention in terms of apparatus and methods for operating specific processing hardware and supporting the techniques involved, it is also possible to provide so-called hardware devices of virtual machine implementations. These virtual machine implementations run on a 530 host processor running a 520 host operating system supporting a 510 virtual machine program. Typically, powerful large processors are required to provide virtual machine implementations that run at a reasonable speed, but such an approach may be justified in certain circumstances, such as when there is a desire to run native code for another processor for compatibility or reuse reasons. The virtual machine program 510 provides an application program intertwined with an application program 500 that is the same as the application program interface that would be provided by the actual hardware that is the device being modeled by the virtual machine program 510. , program instructions, including the memory access control described above, can be executed within application program 500 using virtual machine program 510 to model its interaction with virtual machine hardware.
[00081] Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications may be made thereto by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims. For example, various combinations of the features of the following dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.
权利要求:
Claims (15)
[0001]
1. A data processing apparatus (10), comprising: an instruction decoder (16) configured to decode a sequence of instructions; a data processor (30) configured to process the decoded instruction sequence, said data processor being configured to analyze said instruction sequence, characterized in that the data processor (30) is configured to identify a plurality of adjacent instructions within said instruction sequence, the execution of which is dependent on a same data condition, which execution when said data condition is not met it does not change a value stored in any data register, and in response to identifying said plurality of adjacent instructions, said data processor is configured to:- begin to determine whether said condition of data is found or not;- start processing said plurality of adjacent instructions if m wait for a determination whether said data condition is met or not; and - in response to the determination that said data condition is not met; - skipping to the next instruction to be executed after said plurality of adjacent instructions without executing any intermediate instructions of the plurality of adjacent instructions not yet executed and without restoring said value in said data register, and continue execution in said next instruction.
[0002]
2. Data processing apparatus according to claim 1, characterized in that it comprises a data store, said data processing apparatus being responsive to said plurality of adjacent instructions for storing in said data store an identifier identifying a locating said next instruction to be executed after executing said plurality of adjacent instructions with a target location and to set a skip flag within said data store, said skip flag indicating to said processor that in response to said data not being found said processor must update a program counter with an address corresponding to said target location.
[0003]
3. Data processing apparatus according to claim 2, characterized in that said identifier identifying the location of said next instruction comprises an address.
[0004]
4. Data processing apparatus according to any one of claims 1 to 3, characterized in that said plurality of adjacent instructions comprise vector instructions for processing vector data elements, each vector instruction comprising a mask controlling that elements in said data elements are processed, said data condition not being found comprising a mask comprising zeros indicating that none of the data elements is to be processed.
[0005]
5. Processing apparatus according to any one of claims 1 to 3, characterized in that said plurality of adjacent instructions comprises a plurality of instructions each comprising the same attribute, said instructions only completing and updating any data values stored in response to said attribute being found, said data processing apparatus being configured to: - begin processing said plurality of adjacent instructions; and - in response to the determination that said data condition is not met; - terminate execution of said plurality of adjacent instructions so that no stored data is updated; - skip to a next instruction subsequent to said plurality of adjacent instructions without executing any intermediate instructions of said plurality of adjacent instructions not yet executed and to continue execution in said next instruction.
[0006]
6. Data processing apparatus according to claim 2 or 3, said data processing apparatus further comprising a data store and detection circuitry; said responsive detection circuitry for detection of a conditional branch instruction dependent on said data condition not being found to branch forward around said plurality of adjacent instructions to:- store a value indicating a target location of said branch instruction in said data store as said target location and to set the value of said skip flag within said data store; - said processor being responsive to said skip flag not to capture said branch and to start determining whether said data condition is met or not;- to start running the tioned plurality of adjacent instructions; and - in response to determining that said data condition is not met to set a program counter to a value dependent on said target location.
[0007]
7. Data processing apparatus according to claim 6, characterized in that said conditional branch instruction is conditional on a vector value, said processor begins said determination of whether said data condition is met or not executing a vector test instruction testing said vector value and in response to said vector value being determined to understand a predetermined value it being determined that said data condition is not met.
[0008]
8. Data processing apparatus according to claim 2 or 3, said data processing apparatus further characterized in that it comprises a data store and detection circuitry; - said detection circuitry being responsive detecting a conditional branch instruction dependent on said data condition to branch back to a start of said plurality of adjacent instructions to:- store a location of the branch instruction incremented by one in said data store as said target location and for setting a skip flag value within the data store; said processor being responsive to said skip flag for taking said branch and for determining whether said data condition is met or not; and - to start executing the plurality of adjacent instructions; and in response to determining that said data condition is not met to set a program counter to a value dependent on a value stored in said data store as said target location.
[0009]
9. Data processing apparatus according to claim 8, characterized in that said conditional branch instruction is conditioned to a vector value, said processor begins said determination of whether said data condition is met or not executing a vector test instruction testing said vector value and in response to said vector value being determined to understand a predetermined value it being determined that said data condition is not met.
[0010]
10. Data processing apparatus according to claim 2 or 3, characterized in that it further comprises a data store and detection circuitry; - said detection circuitry being responsive to the detection of an instruction indicating the said plurality of adjacent instructions dependent on said data condition follows said instruction, said instruction comprising said data condition and an indication whether it is to be met or not for each instruction to be completed;- to store an identifier identifying the location of said next instruction following said plurality of instructions as said target location and to set a skip flag value within the data store;- the processor being responsive to said skip flag to skip said target location in response to determining that the mentioned data condition does not is found.
[0011]
11. Data processing apparatus according to claim 2 or 3, characterized in that in response to the determination that said data condition is met, said processor is configured to continue executing said plurality of adjacent instructions and to clear the mentioned skip flag.
[0012]
12. Data processing apparatus according to claim 2 or 3, characterized in that in response to the execution of all said adjacent instructions before said data condition is determined said processor is configured to clear said to jump.
[0013]
13. Method for compiling an application for execution by a data processor, comprising the steps of:- analyzing a plurality of instructions within said application; the method characterized by comprising:- detecting whether said plurality of instructions includes a plurality of adjacent instructions , of which execution is dependent on a same data condition and whose execution, when said condition and data is not found, does not change a value stored in any data record; - if said plurality of instructions includes said plurality of adjacent instructions, then generates a sequence of instructions including a skip instruction followed by said plurality of adjacent instructions;- said skip instruction specifying a target address, the target address being an address of a next instruction subsequent to said adjacent instructions and said condition of data, the aforementioned skip instruction when executing given by the data processor controlling said data processor to begin determining whether said data condition is met or not; e- starting processing of said plurality of adjacent instructions without waiting for the determination whether the data condition is met or not; and - in response to the determination that said data condition is met; - skipping the next instruction subsequent to said plurality of adjacent instructions without executing any intermediate instructions of said plurality of adjacent instructions not yet executed without restoring said value in said data register to continue execution on the mentioned next instruction; and - in response to the determination that said data condition is not met, continuing to execute said plurality of adjacent instructions.
[0014]
14. Computer-readable storage medium, characterized in that it stores instructions which, when executed by a data processor, perform the steps of the method as defined in claim 13.
[0015]
15. Method of data processing, comprising the following steps: - decoding the sequence of instructions; - processing said decoded sequence of instructions; the method characterized by comprising: - in response to a plurality of adjacent instructions within said sequence of instructions , the execution of which is dependent on the same data condition, which execution when said data condition is not met does not change a value stored in any data record:- start determining whether said data condition is met or no; - begin processing of said plurality of adjacent instructions without waiting for the determination whether the data condition is met or not; and - in response to the determination that said data condition is not met; - skipping to the next instruction to be executed after said plurality of adjacent instructions without executing any intermediate instructions of said plurality of adjacent instructions not yet executed and without restoring said value in said data register and continue execution in said next instruction.
类似技术:
公开号 | 公开日 | 专利标题
BR112013019824B1|2021-06-15|DATA PROCESSING APPARATUS AND METHOD, METHOD FOR COMPILING AN APPLICATION FOR RUNNING BY A DATA PROCESSOR, AND, COMPUTER-READABLE STORAGE MEANS
JP4785213B2|2011-10-05|How to analyze computer performance data
US5974538A|1999-10-26|Method and apparatus for annotating operands in a computer system with source instruction identifiers
JP5904993B2|2016-04-20|Method, system, and computer program for debugging multithreaded code
JPH10254700A|1998-09-25|Processor performance counter for sampling execution frequency of individual instructions
JP5353227B2|2013-11-27|Information processing apparatus having performance measurement program, performance measurement method, and performance measurement function.
JP6236443B2|2017-11-22|Sequence control for data element processing during vector processing.
PT100205A|1994-04-29|METHOD AND SYSTEM FOR TRANSLATING A FIRST PROGRAM CODE FOR A SECOND COMPETITIVE PROGRAM CODE
US9262299B1|2016-02-16|Simulation observability and control of all hardware and software components of a virtual platform model of an electronics system
US9262305B1|2016-02-16|Simulation observability and control of all hardware and software components of a virtual platform model of an electronics system
US10061672B2|2018-08-28|Implementing random content of program loops in random test generation for processor verification
ES2882177T3|2021-12-01|Speculative Reproduction of Executable Code
TW201737066A|2017-10-16|Program loop control
TWI648624B|2019-01-21|Apparatus and method for managing history information for branch prediction
US20140019737A1|2014-01-16|Branch Prediction For Indirect Jumps
JP6942298B2|2021-09-29|Exception condition processing of vector operation instructions
TW201800937A|2018-01-01|An apparatus and method for generating and processing a trace stream indicative of instruction execution by processing circuitry
WO2016082601A1|2016-06-02|Method and apparatus for anomaly processing in running process of program
US8352714B2|2013-01-08|Executing watchpoint instruction in pipeline stages with temporary registers for storing intermediate values and halting processing before updating permanent registers
US11099958B2|2021-08-24|Instruction generation for validation of processor functionality
JP2013222392A|2013-10-28|Information processing device, information processing method, and program
JP6926921B2|2021-08-25|Compile program, compilation method and parallel processing device
US7917739B2|2011-03-29|Storage medium storing calculation processing visualization program, calculation processing visualization apparatus, and calculation processing visualization method
TW201737060A|2017-10-16|Program loop control
WO2018149495A1|2018-08-23|A method and system to fetch multicore instruction traces from a virtual platform emulator to a performance simulation model
同族专利:
公开号 | 公开日
EP2673703B1|2019-08-28|
GB2501211B|2016-08-17|
EP2673703A1|2013-12-18|
KR20140014143A|2014-02-05|
MY160644A|2017-03-15|
IL227476A|2017-05-29|
US20120204007A1|2012-08-09|
CN103348318B|2016-08-17|
WO2012107737A1|2012-08-16|
JP2014504770A|2014-02-24|
US9176737B2|2015-11-03|
GB2501211A|2013-10-16|
CN103348318A|2013-10-09|
KR101827747B1|2018-03-22|
IL227476D0|2013-09-30|
GB201313488D0|2013-09-11|
JP6267513B2|2018-01-24|
BR112013019824A2|2020-08-04|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5127091A|1989-01-13|1992-06-30|International Business Machines Corporation|System for reducing delay in instruction execution by executing branch instructions in separate processor while dispatching subsequent instructions to primary processor|
JPH04137169A|1990-09-28|1992-05-12|Nec Corp|Information processor|
GB2273377A|1992-12-11|1994-06-15|Hughes Aircraft Co|Multiple masks for array processors|
US5799180A|1995-10-31|1998-08-25|Texas Instruments Incorporated|Microprocessor circuits, systems, and methods passing intermediate instructions between a short forward conditional branch instruction and target instruction through pipeline, then suppressing results if branch taken|
JPH10320380A|1997-05-20|1998-12-04|Kofu Nippon Denki Kk|Vector processor|
JP2939248B2|1997-07-14|1999-08-25|松下電器産業株式会社|Branch prediction method and processor|
US6918032B1|2000-07-06|2005-07-12|Intel Corporation|Hardware predication for conditional instruction path branching|
WO2004001584A2|2002-06-24|2003-12-31|Ante Vista Gmbh|A method for executing structured symbolic machine code on a microprocessor|
US8904155B2|2006-03-17|2014-12-02|Qualcomm Incorporated|Representing loop branches in a branch history register with multiple bits|
US20100262813A1|2009-04-14|2010-10-14|International Business Machines Corporation|Detecting and Handling Short Forward Branch Conversion Candidates|US9335980B2|2008-08-15|2016-05-10|Apple Inc.|Processing vectors using wrapping propagate instructions in the macroscalar architecture|
US9342304B2|2008-08-15|2016-05-17|Apple Inc.|Processing vectors using wrapping increment and decrement instructions in the macroscalar architecture|
US9335997B2|2008-08-15|2016-05-10|Apple Inc.|Processing vectors using a wrapping rotate previous instruction in the macroscalar architecture|
US9389860B2|2012-04-02|2016-07-12|Apple Inc.|Prediction optimizations for Macroscalar vector partitioning loops|
US9116686B2|2012-04-02|2015-08-25|Apple Inc.|Selective suppression of branch prediction in vector partitioning loops until dependency vector is available for predicate generating instruction|
US9626185B2|2013-02-22|2017-04-18|Apple Inc.|IT instruction pre-decode|
US9348589B2|2013-03-19|2016-05-24|Apple Inc.|Enhanced predicate registers having predicates corresponding to element widths|
US9817663B2|2013-03-19|2017-11-14|Apple Inc.|Enhanced Macroscalar predicate operations|
GB2514618B|2013-05-31|2020-11-11|Advanced Risc Mach Ltd|Data processing systems|
US9830153B2|2014-06-20|2017-11-28|Netronome Systems, Inc.|Skip instruction to skip a number of instructions on a predicate|
US9519482B2|2014-06-20|2016-12-13|Netronome Systems, Inc.|Efficient conditional instruction having companion load predicate bits instruction|
US10175988B2|2015-06-26|2019-01-08|Microsoft Technology Licensing, Llc|Explicit instruction scheduler state information for a processor|
US10346168B2|2015-06-26|2019-07-09|Microsoft Technology Licensing, Llc|Decoupled processor instruction window and operand buffer|
GB2540941B|2015-07-31|2017-11-15|Advanced Risc Mach Ltd|Data processing|
US10055208B2|2015-08-09|2018-08-21|Oracle International Corporation|Extending a virtual machine instruction set architecture|
CN107179935B|2016-03-11|2021-01-29|华为技术有限公司|Instruction execution method and virtual machine|
KR101699491B1|2016-03-29|2017-01-24|주식회사 포워드벤처스|Device, method, and computer program for displaying function instructions|
CN107329936A|2016-04-29|2017-11-07|北京中科寒武纪科技有限公司|A kind of apparatus and method for performing neural network computing and matrix/vector computing|
US10860534B2|2016-10-27|2020-12-08|Oracle International Corporation|Executing a conditional command on an object stored in a storage system|
US10956051B2|2016-10-31|2021-03-23|Oracle International Corporation|Data-packed storage containers for streamlined access and migration|
US10884745B2|2017-08-18|2021-01-05|International Business Machines Corporation|Providing a predicted target address to multiple locations based on detecting an affiliated relationship|
US11150908B2|2017-08-18|2021-10-19|International Business Machines Corporation|Dynamic fusion of derived value creation and prediction of derived values in a subroutine branch sequence|
US10884747B2|2017-08-18|2021-01-05|International Business Machines Corporation|Prediction of an affiliated register|
US10534609B2|2017-08-18|2020-01-14|International Business Machines Corporation|Code-specific affiliated register prediction|
US10884746B2|2017-08-18|2021-01-05|International Business Machines Corporation|Determining and predicting affiliated registers based on dynamic runtime control flow analysis|
US10908911B2|2017-08-18|2021-02-02|International Business Machines Corporation|Predicting and storing a predicted target address in a plurality of selected locations|
US11150904B2|2017-08-18|2021-10-19|International Business Machines Corporation|Concurrent prediction of branch addresses and update of register contents|
US10719328B2|2017-08-18|2020-07-21|International Business Machines Corporation|Determining and predicting derived values used in register-indirect branching|
CN111124500B|2019-12-12|2022-03-08|浪潮电子信息产业有限公司|Instruction execution method, device, equipment and storage medium|
法律状态:
2020-08-18| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-01-05| B06A| Patent application procedure suspended [chapter 6.1 patent gazette]|
2021-04-06| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-06-15| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 26/01/2012, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US12/929667|2011-02-07|
US12/929,667|2011-02-07|
US12/929,667|US9176737B2|2011-02-07|2011-02-07|Controlling the execution of adjacent instructions that are dependent upon a same data condition|
PCT/GB2012/050161|WO2012107737A1|2011-02-07|2012-01-26|Controlling the execution of adjacent instructions that are dependent upon a same data condition|
[返回顶部]